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ABSTRACT 


Most significaat unprovemeats in algorithm peiibnnance are achieved by paraHeMsiQ. 
Large scale paialleliBm at the intei-piocessor level are beuig used to get last and eJBcieiit 
parallel algorithia. In order to explcdt the potential of the paraHel machines one needs to 
design parallel algorithm fox the existing proMems. 

In this thesis , attempt has been to device parallel algorithm for some graph theoretic 
ptoblems. The fiist two problems are on the directed acyclic network and the third problem 
is on nndirected graphs. A parallel algorithm has been dedgned to evaluated a set of 
algebraic expressions. This algorithm reduces the number of processors required for the 
evaluation of the expression. This algorithm takes 0(log^ n) time with 0(n) processors 
where n is the number of unique computations. The second problem is the computation 
of the shortest/longest distance on dags. Two algorithms were designed. The algorithm 
designed for the general problem works with 0(ne) processors but the time bound of the 
algorithm is not sub-logarithmic. The performance of the algorithm on a typical graph is 
closer to 0(log^ n). The second algorithm is fox a dass of directed acyclic graphs computes 
shortest and longest paths in 0(log^} time with 0(n«) processors. Both the algorithms 
make use of the modified tree merge technique and they requite CREW PRAM model fox 
the computation. Final part of the thesis work deals with parallel algorithm fi>r recognising 
line graphs. An efficient algorithm has been designed for the PRAM model. An algorithm 
was designed with O(n^) processors and 0(log^ n) time. 
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Chapter 1 


Introduction 

1.1 Parallel computing 

A myiiad of taalm lequire fast comput&tioii eithei to get real tune lespoiise or to process 
large volame of data set. Parallel systems comprising of a multiple n.um1>er of processors 
can provide the support essentiai to meet the computational requirements. The kind of 
parallel algorithm designed for a compuiatioBa] problem depends heavily on the nature 
of the problem and the architecture used to compute the solntious. One obvious way to 
devise a parallel algorithm is to exploit any iuherent parallelism in an existing sequential 
algorithm. However bUndly tzansfomuiig a sequential algorithm to parallel form is not 
an effectiTe solution always. Some sequential algorithms have no obvious parallelisation. 
Parallel algorithms made &om such sequential algorithms will exhibit poor speed up. These 
proUems axe not really inherently sequential. Parallel algorithms lor these problems can be 
devdoped in two different ways; one can invent a new parallel algorithm or one can adopt 
another parallel algorithm that adves a simitar problem. If the aeqnentia] algorithm is not 
paiticnlarly paiallelisable, then one most be able to apply some external knowledge of the 


1 



2 


pioblem in. otdei to bie&k the pioblem into computational tasks wkicb can be executed 
independently 

The peifoimance of an algorithm can be drastically different on different architectures. 
This is piimaiily due to communication or synchronisation overheads. One should not 
ignore the communication costs or sjmchronisation costs in determining the complexity of a 
parallel algorithm. At tunes, non-compntational cost can be higher than the computational 
cost. In other words, more time is spent routing data among the procesaoxs or aynchtonisuig 
the processors rather than performing the required computation. 

In the design of parallel algorithms, the parallel time can be reduced by employing 
more processors. A constant amount of increase in the number of processors can reduce the 
parallel time only by a constant factor. If one attempts to reduce the paralld time by a more 
than a constant factor , then the number of processors employed for the task most be more 
than a constant factor. In other words , the change in the number of processors employed 
must be a function of input sise. Although parallel algorithms can be designed with any 
nombez of proceaaoxa , the parallel algorithms requiring more than poljmoiniBl number of 
processors are not preferred. The rlwM of problem which can be solved in the polylog 
time with polynomially many processors is called Nick’s class. This class represents the 
sequential pNme algorithms which admit fast puallel algorithms with reasonable number 
of processors. 
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1.2 Models of computation 

Computer architectures caa be daissified by the concept of instruction stream and data 
stream. Depending upon the number of instruction and data streams used by the system, 
it can be clasaLBed into any one of the feOo'wing four models: 

• Single Instruction stream Single Data stream (SISD) 

• Single Instruction stream Multiple Data stream (SIMD) 

• Multiple Instruction stream Single Data stream (MISD) 

« Multiide Instruction stream Multiple Data stream (MIMD) 

SIMD and MIMD are the most widely used models of parallel computatiou. An SIMD 
computer consists of many identical processors. Each of these processors possess its own 
local memory where it can store data. All the processors operate under the a>ntrol of a 
single instruction stream issued by a omitral control unit. Eqaivalenily, processors may be 
assumed to hold identical copies of a single program. Each processor’s copy being stored 
in its local memory. There are muiy data streams and eadi processor handles one data 
stream. 

The processors operate synchronously: at each step, aU proceasois execute the same 
insttuctiou, each on a diffexeut datum. At times, it may be uecessaiy to hare only a 
subset of the processors to execute the instxnctiou. This inibxmatiDn can be encoded in the 
instiuctiou Ltseli theiehy ialliiig a ptooesaot whether it should be active or inactiTe. There is 
a mechanism, sudh as a gbbal dock, that ensures lock-step operation. Thus processors that 
ate inactive dnimg an instruction or those that complete execution of the instruction before 
others may stay idle until the next instmctLon is uened. The inter-process cc»nmunication 
can be either via a shared memory or an inter-connection network. 
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This class is also Icaown in the liteiatnre as parallel zandom access nLachi]ie(FBAM) 
model. Ptoceesois skaze a common memory in the way a gionp of people use a bnlletin 
hoard foi conunnnication. When two piocessors wish to communicate, they do so through 
the shared memory. The basic model allows all piocessors to gain access to the shared 
memory sunnlianeoiiBly if the memory locations are different. SIMD SM can be farther 
divided into the fbllowmg four subdasses: 

• Exclusive Read Exclusive Write PHAM (BREW) 

e Conciment Read Exclusive Write PRAM (CHEW) 

e Exclusive Read Concurrent Write PRAM (ERCW) 

• Concurrent Read Concurrent Write PRAM (CRCW) 

EREW PBAM Model 

Access to memory locations is exclusive in this modeL In othei words, no two processors 
axe allowed to simiiltaneoiisly read from or write into the same memory locatbns. It is the 
simpks shared memory model. Other shared memory models can be simulated on this 
model. 

CREW PRAM Model 

Multiple {ffocesBOzs can simultaneously read from the same memory location but no two 
processors axe alloiwed to write into the same memory location simnltaneonsly. 

ERCW PRAM Model 

Multiple piocesscKcs are allowed to write into the same memory locatitms but the read 
access remains exclusive , i.e., srmnltaneouB read from a memory hrcation is not permissible. 
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CRCW PHAM Model 

Botk lead and write privilegee are granted. Two proceseoie are allowed to lead fcom oi 
write into aame memoiy location. Tkere is no need to lesolve the concnnent read operation. 
If the two or more piocessora write in to the name memory location. Tie write conflict is 
resolved. This modd can he dlaasifled based on the method followed to resolve the write 
conflicts. 


1.3 Algorithm performance 

The performance of a parallel algorithm performance is jndged by its running time, the 
number of processors used and the cost. 

RUNNING TIME 

Since speeding up compntaUon appears to be the maiii reason behind our interest in 
building parallel computers, the most important measure in evaluating a paraHd algorithm 
is its nuining time. Running rime of an algorithm is the time elapsed from the moment the 
algorithm starts to the moment it terminates. Theoretical analysis of the running time is 
done br oonnting the number of basic operations/stepa executed by the algorithm in the 
worst-case. This yields an expression descrihing the numher of such steps as a function of 
input and output sice. A good indicarion of the quality of a paialkl algorithm is the speed 
up it produces. 

SPEEDUP as worst-ease seq. tmie/wotst-ease parallel time. 

Thus , more the speedup bettor the algorithm. 

NUMBER OF PROCESSORS 

The second most important critericm in evaluating a parallel algorithm is the number of 
processors required to solve the problem. Larger the number of processors an algorithm uses 
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to solve a problem, ike more expensive the solution becomes to obtain. The computational 
time of a problem can be reduced by employing more proceseors , as long as there is scope 
to employ more processors in the computational tash. 

COST 

The cost of a parallel algorithm is defined as the product of the previous two measures. 

Coat = parallel running time *, number of proceaaora. 

Coat equals the number of steps executed collectively by all the processors in solving 
a problem in the worst-case. If the cost of the parallel algorithm ior the problem matches 
this lower hound , within a constant multiplicative factor, then the algorithm is said to be 
cost optimal. When no optimal algorithm is known for solving a problem, the efficiency of 
a parallel algorithm for that problem is nsed to evaluate the cost. This is defined as follows: 

EFFICIENCY = worst-caM aeq. time of fastest algorithm/ cost of parallel algorithm 

1.4 Measures of complexity 

Sequential models of oompntation ate equivalent within a polynomial amount of time. As 
a conaeqnence computability is insensitive to the choice of the modeL 

A rlalvn holds fot paiaJQfl] modflls of computation. All universal models of ex- 

panding parallefiBm can be shown to be polynomial-time equivalent. Each model can be 
simulated by the others with at most a polynomial loss of time. In particolar, CBC!W, 
CREW and BRCW can be simulated using EREW model in O{logn) time. Nick’s cU« is 
lobnat. The class of the problem remains the same in all the shared memory models. 

Although the PRAM model ignores many important aspects of real parallel machines, 
the essentia] attributes of a parallel algorithm tend to transcend the modes for which they 
are designed. H one PRAM algorithm outperforms another PRAM algorithm, the relative 
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perfbrin.&iice is not likdy to change suhstantiaDy when both algorithms aae adapted to run 
on a real computer. ^ 

The running time of a parallel algorithm depends on the number of processors executing 
the algorithm as weU as the sise of the problem input. Generally, thezeloie, we mnst discuss 
both time and processor count when analysing PBAM algorithms; Tnacaliy, there is a 
tiade-off between the number of processors used by an algorithm and its running time. 

1.5 Thesis outline 

chapter 2 gives an outBne of the methods to evalnate sets of algebraic expressions. Sequen- 
tial algorithm for the evaluation of an arithmetic expression take 0(n). An optimal parallel 
algorithm is available for the problem. It needs 0(logn) time. Out attempt is to design 
a parallel algorithm which can evalnate a set of expression with leas number of processors. 
It eliminates the duplicate computation in the case of common sub-expressions among the 
various ^rpiesrion of the set. 

Chapter 3 deals with the problem of ciitica] path identification on aoe networhs. Though 
this chapter deals with the problem of critical path computatiou, result of this chapter 
axe relevant Ibr shortest and longest path computatiou on iag networhs. This protdem 
takes 0(n + e) time on a sequential madriue. There ate known parallel ilgorithms which 
tahe O(lcpn) time with O(n’) processors. Attempted was made to reduce the number of 
prooeasots required for this computation. The method devised assigns one processor for 
each computation in the dag structure used for the computation of the set of eiQizessions 

Chapter 4 deals with design of parallel algorithm for recogniring Ibe graphs and to 
construct the root graph from the given Hue graph. Best known sequential algorithm frrr 
this problem takes 0(n + e) time. Attempted has been made here to construct a parallel 
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algoritlim which c&n. lecognize liae giaphs and leconstcuct the loot graph of the line graph. 
As a first step, a PBAM algoiithni was developed with minimal number of pxoceeeors. This 
algorithm tahes O(lopn) time to construct the root graph from the line graph. This requires 
O(n^) processors to constmct the root graph. 

Finally , chapter 5 briefr the condnsions of the thesis. 



Chapter 2 


Expression dags 

2.1 Introduction 

Evaluation of anthmetic erpressions is one of tke well understood metliods £oi solving 
problems. Sequential algoiitlim to evaluate aiitkmetic expressions takes 0(n) time where 
n is tke number of Idnaty operations in an expression. Sequential algonthms evaluate tke 
e:q>xe8sions by evalnaUng tke sub-expression in a bottom-np fashion on tke parse tree of tke 
expression. Since tke numbez of computations to be done is n , izrespectiTe of tke order in 
wMck tke snb-expiessionB are evaluated , tke sequential algorithm would take 0(n) time. 
In tke parse tree of an expiestdon , each node represents a aub-expression to be evalnated 
and each edge of tke tree represents data dependency between two subexpressions in tke 
tree. Tke data dependency among tke snkexpresstons imply tkat on any parallel madhine 
tke aritkmetic expression evaluation would take at least 0(k) time wkere h is tke keigkt 
of tke tree. The keigkt of a binary tree of n nodes can vary from logn to n bnt it can 
not be lees than log n. This observation tkat tke keigkt of a binary tree of n nodes can 
not be less than log n gives an immediate lower hound of f2(log n) for tke evaluation of 
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ajrith.metic expiessioas. On a parallel machine , ihe anthmetic ej^zeasions can he evaluated 
in two diffeient ways. The expreseione can be evaluated in parallel mtker by rewriting the 
expiession in such a way that the height of the expression tiee is moie than the minimuia 
possible at most l^ a constant factor oi by evaluating the eiqjiession as a whole such tkat 
the tdze of the expression reduces by a constant factor after each iteration of ihe parallel 
algorithms. Both the method gives the best possible time bound for the parallel evaluation 
of ihe expression tree. But the time bound proved in the case of height reduction algorithm 
is asymptotic. 

2.2 Height reduction method 

B..P.Bient , Hack and MamyamajBK 73] F.P.Pieparata and D.E MnUerfPM 75] have de- 
signed algorithms which ate ba«ed on the height redaction method. These algorithm tries 
to rewrite the expiMrionB into an equivalent form where the hdght of the tree is bounded 
by 0(log n). In order to rewrite the expression into an equivalent form , the algebraic laws 
axe used. The evaluation is carried out in two stages. In the first stage of the algorithm , 
the expression to be evaluated is rewritten in sncb a way that the height of the parse tree is 
bonnd by 0(log n). In ihe second stage, the new expression is evaluated in the bottom-up 
order, finb-expresrions which are in the same level are evaluated in pauBeL Since the 
bound on the height of ihe expression tree is achieved only for higkei values of n , the time 
bonnd of log n with 0(n) processors b achieved only when ihe number of binary operations 
in the expiession is a sufficiently high value. In othei woids , the time bound achieved 
is asymptotic. Algebraic laws are used to reduce the depth of the expression tiee. The 
arithmetic expiasion are repeatedly rewritten to equivalent expression where the depth is 
piogressively less. 
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Figure 2.1; DecLtdou Tree 

Tku algoiitluQ lecutaively restiuctuxes tke given, expreesion E and letums its equivalent 
expression , whose deptk is bounded by 0(log n). Tkis ensures tkat the expression can be 
evaluated iaster on a parallel machine. Tke first part of tke algoritkm which restructures 
tke expression is a recursiTe sequential algontkm which takes O(n^) time to xestmctnxe 
the tree. The second part of the algorithm is the parallel evaluation of the restructured 
expression. First part of the algorithm is a realiiation of the dednon tree given in figure 2.1. 
In each step of tke structuring a suberpresdon X of sise Nu is identified in ike eipression 
such tkat the expression can be rewritten into AX + B where the sise of A , B , A* are sudi 
that the over all expression can be evaluated in time 0(k). The subexpreseionB A,B,X 
are themsdves recursively lestructnied to get the requited depth. {Bji} in this algorithm 
represents a sequence of integers satisfying the recurrence relation Nj^ = N/k-i *f •¥ 1 
with h^=:j + lforj=0,l,2, and 3. 

This algorithm puts an npper bound k on the depth of the expression E, i.e. , t(E) 
when I E |< Nk- Since the longset positive root of the recurrence relation is Aq = 1.3802 , 
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the height of the tiee after rewriting will he t{E) < 2.1507/ej | E | -fc. A brief description 
of the aigorithm is given below: 

In each etep of the algorithm , it finds a snbexpression X of the expression E and 
rewrites it into E* = AX + B such that A < < A*_i, A < Njt-i and E < Nk- 

This ensnies the convergence of the algorithm after finite nnmbei of st^. 

Algorithm ST-REDT 
STEPl 

Find a subexpression Xi = of , for some operation 0 , such that 

\Xt |>A*-a-J\r*-3 + Afc-* + l 
iXi 1> 

I Rl l> JVjt-l “ E‘k-2 + -^i-4 *ttd 

STEP 2 

K I lii |< Nk -9 wt JET <= ^ and halt 

STEP 3 

If ^ is + set JT <= AJiq + (Ajli + B)* and halt 

STEP 4 

Find a anbexpresaion Xq — L^S'RiofRi for aome operation 0 such that 

I Aa |> Nh-4+I I \< I Sa \< Nk-i 

STEP 5 

A* If 1 Xi |< set 4= (A 1 X 1 B 2 + ft)* + aj(A5(i;;a5)) 

B. H Nk-k <1 Xr 1< jr *_4 set E* ^ (AiXiBa + Bi)* + (A^ XJXAJAJ) 

C. If I Xi > JV*-.s + 1 set E* 4= (Aiilft + ft)* + X;(A5(AfA5)) Halt. 
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2.3 Tree contraction method 

Millei and Reif[MR 86], Kosarajn and DelckeifKD 88] have pioposed tiee contraction meth- 
ods to evaluate any arithmetic expression. Tree contraction methods use the divide and 
conquer paradigm for the parallel evaluation of an expression. At every iteration of the algo- 
rithm the number of nodes which are removed from the tree is a constant factor of the total 
number of nodes in the tree. Unlike the tree height redaction algorithms, tree contraction 
algorithms do not try to modi^ the structure of the expression to be evaluated. These algo- 
lithmB have many advantages over the height redaction algorithms. Contraction algorithms 
axe easier to implement and their empirical perCoimance is better than the height reduction 
algorithms. Tree contraction algorithms do not require restructuring of the expression and 
hence it can be applied for any expression without the sequential pre-processing. 

Kosataju and Delcher’s algorithm evaluates the expression in 0(log n) time by employing 
0(n) processors. The number of processors required for the compniation can be reduced 
by a &ctoi of logn applying Btent’s[BR 74] theorem lot a set of aasociative operationa. 
This algorithm identifies set of nodes to be removed at every iteration of the algorithm 
and cazxiea out a set of local operation for each removed node. The outgoing edges and 
the finear ezpressionB of some of the nodes are modified in suck a way that the resulting 
expression produces the same mult as the oiipnal tree. At every iteration of the algorithm 
, the leaf nodes are numbered in the in-order and the even numbered leal nodes are marked 
for removal from the expression tree. Each marked leaf node and its parent are removed 
form the tree. The marked nodes axe removed in two steps , left leaf nodes are removed 
first then the right leaves are removed. This two stage elimination ensures that there is no 
concurrent read or write iu the memory locations accessed by the processors. Parent node 
of each node is exclusively accessed by that node , hence it can result in concurrent read or 
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write. However when two consecutive left node or two consecntive right nodes are removed 
fcom the tree at the same time , may try to access the same grandparent node. Althongh the 
grandparent nodes are same , they access different data for the local computation. Hence , 
the grandparent access to does not result in concurrent read or write operation. Processor 
assignment for this algorithm is simple , one processor is assigned for each leaf node in tree 
and any computation pertaining to that leaf node is done by the processor. Bach processor 
will be active only if the corresponding leaf node is still in the tree. 'Hus method gives an 
optimal algorithm for the expression evaluation on the BREW PRAM model. 

2.4 Evaluation of set of expressions 

If a set of expiessioiu are to be evaluated at the same time , one need not compute the 
common subexpressions repeatedly. The computational cost can be reduced by tedncing 
the number of processors employed for algorithm. This can be achieved when each common 
sub-expression is evaluated just once during the evaluation of the set of expressions. The 
oompntations to be done can be represented in the form a directed acyclic graph (day). 
The parallel algorithm described here gives a way of comparing the expressions of the dag 
stinctare. This algorithm assumes that there is no common sab-expresuon within an ex- 
presrion. It is shown here that a set of arithmetic expresrions with n unique sub-expxesmons 
can be evaluated on an CREW PRAM model in 0(log^ n) rime with 0(n) processors. This 
algorithm can be applied for any two operators 0 and 0 which ace eommntatlve and 0 is 
distributive over 0. Althongh it k assumed that the op^ators axe commutative the same 
results can be proved without the commutative property of the operators. For brevity, 
it k assumed that the expressions consist of just constants , integer addition and integer 
multiplication. The algorithm can be applied for other arithmetic operators , any operators 
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whidi obey multipBcative and additive propeity. 

2.4.1 DAG Structure 

The set of arithmetic expression to be evaluated aie specified in. the form of a directed 
acyclic netwoiL Bach node of this network corresponds to a snb-expressLon to be evaluated. 
Network contains two types of nodes , internal and terminal. Nodes with ont-degree two are 
internal nodes and nodes with ont-degree xero are terminal nodes. Each internal node of 
the network contains a binary operator while each terminal node of the network contains a 
constant value. The graph induced by the set of nodes which are reachable fitom a particnlax 
node is a tree and it is the expression tree of the sub-expression corresponding to that 
node. Each node of the network has memory space to store the value of the sub-expression 
and each internal node stores the operator which corresponds to that node. Each node is 
connected to two nodes , left and right nodes , which represent the left and right c^erands 
of expression corresponding to the node. Edges in the network specify the interrdation 
between the sub-expressionB in the network. An edge between Any two node in the network 
indicates that the value of the node at the tail of the edge depends on the value of the node 
at the head of the edge. Each edge of the network stores a linear expresmon in the form of 
a pair. It represents the contribution of the left or right sub-result towards the higher level 
subexpression. The Hneat expresmon is called contiibation vector. The contxibntion vector 
is modified in snch a way that the eipiession iag at U1 point of time represents the same 
values as the original expiessioas. All the contribution vectors are initialised to [l, 0} at the 
start of the evaluation process. This inirial value represents all the ezpcessron of the set. 
This inirial value depends upon the operator wUch ate used in the e^pxessicms. Alter each 
iteration of the algorithm , some nodes of the network are removed and the contribution 
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vector Me modified for the temainuig edges of the network in sack s way that the value of 
lemaitting nodes in network is nnafiected. One processor is aeragned to each of the nodes 
in the network and the processors will be active whenever a computation is done for the 
corresponding node in the tree. This algorithm selects a set of non-overlapping nodes by 
constructing adjacent trees ior the dag structure instead of the in-order method which is 
normally used for tree-stmctnred computations. 

2.4.2 Evaluation procedure 

Each edge in the network stores a linear expression aX +h which is denoted hy the pair 
[a, h]. If is the value of the node at the head of the edge. Value at a node in the network is 
defined recursively, fibr any node u with operator O , if the values at its two snb-expression 
nodes are VALi, and VALr and the expression at the left and right edge are {ajX +ii) and 
(or + hr) I respectively , then the value at « is defined to be the resnlt of the expression 
{aiVALx, + ii) © (arVALji + ir) 

The expressions at the edges wludi point to u indicate the contribution of the value at it 
to the value of the node at the higher levels. At each phase of the algorithm, we ensure that 
the value at every node of the network defined in this way equals the value of the original 
snb-expression. To make dtis true initially, we begin with the expression X., a« = 1 and 
=: 0 , stored at every edge e the network. 

The algorithm works in two phases. In eadi step of the first phase , the day is contracted 
hy removing some of the terminal and internal nodes of the dag strnctaie and the expiesdons 
ate partially evalnated. In the second phase of the algorithm the exptesnon ate evaluated. 
H jt- k ft and dip are used during the first phase of the algorithm to contract 'Ae expression 
dag. operation can be replied to any internal node with at least one terminal node. 
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Clip c&n. be Applied only if botk the left and light nodes nte terminal nodes. 

Rake operation 

The lake operation can be applied for any node with one or more terndnaJl nodes. Rake 
operation removes the concerned internal node and its spedfied terminal node. Two con- 
secntive rake opeiationB can result in data loss. In order to ensure that coziectn«w of 
the computation the rake operation is applied only after a marking which avoids ovedap- 
piug rake operation and generates ptimal number of rake operations. To apply the rake 
operation to an internal node and its terminal node w we have to 

3. disconnects w and the internal node from the network and ubling of w becomes the 
son of parent of the internal node. 

2. computes [a, 6] for the newly established connection between the parent of the internal 
node and the sibling of w. [a, ft] can be computed by computing the ooelBcieut of X 
and the constant term. 

if ti> is a left node apar<«)((awCv + fc») 6^r(w) + ^»a(»))) + ^r(w) 

if to is a light node 0^r(«) (««c* -f ft,)) + fc 4 >cr(m) 

Clip operation 

The dHp operation is applied only to the internal nodes of the type LE. An internal node 
becomes a &lly computed node as soon as the clip operation applied to it. Any intenul node 
which is subjected to dip operation is not indnded in the dependency tree constmction. 
Only the partially evaluated nodes generated by the rake operations are indnded in the 
dependency tree. CBp operation simply evaluated the node value aud stores the result of 
the compuiaticm in the iutemal node and its left and right nodes ate disconnected horn the 


node. 
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Dependency tiee »te constructed during tie first pkase of tie algoritim. If «i internal 
node with one leaf node is removed from, the day then the evaluation of the internal node can 
be completed till the value of tie of its other operand is known. Each step of contraction 
operation generates certain dependencies. These are computed in the form of a tree. Since 
tie number of contraction steps executed is bounded by log n , the height of the dependency 
tree can be greater than log n. The expressions of the dag structure are evaluated in tie 
second phase of tie algorithm in the order specified by the dependency tree. 

Non-overlapping internal nodes are identified by constructing a set of adjacency trees. 
These trees represent the concurrent data requirement for the removal operations executed 
at various internal nodes in the dag strnctnre. This partitions the set of nodes in tie dag 
into many gronps where nodes of an adjacency tree forms a group. Nodes of each gronp are 
spHt in to two non-overlapping sets and the set of nodes having at least half of the total 
number of nodes in the group is marked for removaL Nodes of each such set is removed from 
the dag and the dag structure is contracted As a first step in the process of constructing 
adjacency trees , Each iaienul node in the netwcark is assigned a label depending npon the 
i oTwiinitlg connected to it. The possible labek are NONE, LEFT, BIGHT and LB.. These 
labels represent the potential local operations for the internal nodes. Eadi node in the 
dag ia represented by a node in the adjacency trees, if node is LEFT or BIGHT then it is 
connertfd to the node of right or left node respectively in the trees and a value 1 is stored 
in the node. If a node is a NONE or LR node then it is connected to itself and a valne 0 
is stored in the node. Distance of the nodes torn the toot of their trees is compnted nsing 
pointer donhling technique. Nodes of a tree or s^t into two set , mode with even distance 
foom the root node and nodes of odd distance from the root node. For each group , the set 
with the maximum number nodes is sdected and the nodes of this set are eliminated from 
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the das stiucture ia the next phxse of continction operation. L&hds we assigned 
lot each of the inteinai nodes in the dag- And contraction operation is appfied for the 
LR internal nodes along with their left terminals. The contraction step is repeated till 
the tree contains only unconnected internal nodes. At this stage the second phase of the 
algorithm starts. In the second stage of the algorithm the expression are eralnated in the 
order specified by the dependency tree. The second phase of the algorithm is necessary if 
one wants to evaluated all the expressions in the set. 

Algorithm 
PHASE I: 

fox t = 0 to log n do 

1. Set the node type for each of the internal nodes. 

2. Apply dip operation to the internal nodes of type LR. 

3. Set the node type for modified internal nodes. 

4. Mark a set non-overlapping internal nodes in each tree. 

5. rake the marked internal nodes and its terminal node and recompute the Hneax ex- 
pressions for the siblings of the terminal nodes which are removed 

PHASE IL- 

iati — logn down to 0 do 

compute erprewrioas of nodes at hei^t i in the dependency tree 

To mazk a set of non-ovetlapj^g internal nodes in each tree one may do the foUowiiig: 

1. Partition the network in to many gronps. If two atijacent nodes ate LEFT or RIGHT 
type internal nodes then they we put In the same partition. Nodes of each partition 
and ihdr 'edges between them form a in-tree. 
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2. Compute tke distuice of e&ch node from tie loot node of its adj&cency tiee. 

3. SpHt the nodes n p&rtition in to two groups. Even distance nodes *nd odd distance 
nodes are gathered separately. 

4. In each partition , the bigger group is marked for the take operation. 

In the first phase of the algorithm , the fineai expressions ate recalculated after every 
rake step. This step ensures that the value of subexpression which are present in the dag 
are unaffected through out the computation. In the second phase of the algorithm , the 
linear expression are used to calculate the exact result of the remaining subexpressions. 

CREW model is used for the computation. Noire of the operations require concurrent 
write operation. The concurrent read is done in the tree donhUng operation. The processor 
assignmmit simple in this algorithm. One processor is assigned for each node in the directed 
acyclic structure. And all the operations pertaining to any node is dcme hy its processors. 
This imply that the number processors required for the computation is exactly n. 

Local operations , Rake and Clip , which are done to contract the dag take 0(1} time 
because they need to compute only a fixed amount of take for each application of the 
operation. The adjacency tree construction can be done in constant amount of time. The 
process of identi^g the nodes for rake operation takes 0(logn) tune because this involves 
oomimtatioB of distance of the nodes from iheu root nodes and the sorting of nodes on the 
bask of root node and distance. The contraction operation is executed logn times. Hmce 
, the first phase of the algonthm takes 0(bg^) time. The second phase of the algorithm 
involves computation of expressions. This requires constant amount of time. 



Chapter 3 


Computation on an dag 

3.1 Overview 

CojDpatation of skottest and longest patk on a directed network is one of ike extenavdy 
studied grapk tkeoietic piobdems. Many sequential and parallel algoiitkms kave keen de- 
signed for tkis pioklem. Computation of tke skortest or longest distance on a network takes 
0(n-f< e) time on a single processor mackine. Many parallel algcoitlims kave been designed 
witk polylog parallel time. Dekd et aL [DN 81] proposed a parallel algoiitlun of 0(log^ n) 
time complexity for computing ike skortest distance on a non-negatiTely weogkted. Tkis 
algoiitkm can be executed on perfect akuffle and cube connected cycle. Ckaudkaii and 
6koBk[CG 86] kaxe proposed a parallel nlgoiitkm wMdi requires 0(log d loglog n) time 
bound wkeze d is tke diam eter of tke network (i.6., tke number of edges in tke longest patk 
from tke start node to tke terminal node of tke network. Ckaudkuri kas proposed dis- 
tributed algoritkin to analyse tke dag structure for critical activities. All tkese algoritkins 
employ 0(n^) processors to compute result. Ckaudkuri[CH 80] kas presented an adsqytive 
paraDd algoiitkm for analysing aoe networks on an SIMD-SM computer witkout read or 
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write conflicts. This algoiitliia assames a bonnded paiaUelLsm. Tlie time bound achieved 
by the algorithm u with proceseoie , where h(0 h < 1) depending upon 

the number of processors available. The computational cost of thesefDN 83, CG 86, CH 90] 
paialle] algorithms are much higher than the cost of the sequential algorithm. The known 
parallel algorithms emplc^ 0(n*) processors to get polylog time performance. We attempted 
to reduce the number of processors employed for the directed acyclic graphs. The computar 
tions on the directed acyclic graphs have many applications. Topological ordering of events 
, the analysis of activity-on-edge network are two important application of this problem. 
Analysis of aoe network involves project time calculation and identifying the critical activ- 
ities. Path construction and walk construction are similar. The only difference is in the 
occurrence of the nodes. Path can use a node at most once in the entire sequence. On 
the other hand , walk can have a node any number of times. The absence of the loops 
in the directed acyclic graph ensures that the path construction is as simple as the walk 
construction. Hence , the computation of shortest / longest path can he done suoulaE algo- 
rithm. and their fximplexity is same oa directed acyclic paphs. We attempted to reduce 
the computational cost of the parallel algorithm for the path problem on iajfs. Hie second 
section of the duqptei talks about the topological ordering problem which is a path problem 
on unit weighted dags. This problem is a simplm problem than the shortest or hmgest 
problem on the weighted dagt. The third section of the draptei dmcnsses a known parallel 
which is based on the tree marge tochniqne. The fourth and final sedicrn describes two new 
algorithm for the path problem. 
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3.2 Topological ordering 

Topological oiderittg problem is tbe longest path computation problem the unweighted 
directed acyclic graphs. Topological ordering of the nodes of a dag is a linear ordering of 
the nodes with the property that if node I is a predecessor of node J in the dag then node 
I precedes node J in the Uneax ordering. This problem can be solved by the general matrix 
mnlticatiott technique. But , the number of processors requited for sudh an implementation 
will be n^. The number of processors can be appredably reduced by using straaaen[ST 89] 
recuTHive multiplication technique. With this technique the number of processors employed 
can be reduce to 0(n*'“). This aigoiithm performance is asymptotically better than the 
earlier algorithm. 

3.3 Longest path Algorithms 

3.3.1 Deimitiozis and Notations 

In this section a number of definitions and notations axe introduced. The term ’graph’ 
refers to a directed acyclic graph. CSven a node a in a graph if , a tree containing all nodes 
reachable from * through a path consisting of 2^ or less number of axes , where y is an 
integer satisfying 0 < j < logn f is denoted by r(a;,y). The parent of a node x ^ s in 
tree T(*, j) , denoted by parent(r | r(*,;))iB a node y if there exists an arc y x in <?• 
The tree r(»,0) contains all innles y such that a y is an arc in if. Dist(y j T{»J)) 
denotes the longest possible distance from node s to y on the graph using at most V edges. 
Level(y | T(xJ)) denotes the edge distance of the node y from x when at most V edges 
are used for the traversal. Select(y | T(x,i)) indicates that y is a termiusJ at level and 


k selected for tree merge. 
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3^3.2 Known algorithm 

Tie longest p&th on ^ dag netwoik cnn be calculated by logn application of tree merge 
algorithm. Tie longest path algorithm takes unit height trees as input which is nothing but 
the set of outgoing edges for each node in tie netwoik. 'Dee merging algorithin produces 
T(x,j + 1) from each tree T(xfj) , for x belonging to a graph Q. The tree r(*, j + 1) is 
obtained fxom the tree T(x^j) and the trees T(t,j) where t is a terminal node in T(x,j). 
'The nodes 0 are identified by 1 through n. The set of trees T(x, j) , for each x belonging 
to Q , bom the input to the algozithm. 

Input : 

A tree T(x^j) , for each node « in , specified by Paient(y j T(*, j)) p = 

Output : 

A tree T{x.,} + 1) , foi each node x in £f , specified by Paient(y|T(xj+l)) y = l,2,...n. 
step 1 [Gompute Distance] 

Ibr each pair z,y — l,2,...n Set distance of node y hom root node in the tree of x 
step 2 [Identify terminal nodes] 

Fbr each pair x,y = l,2,...n 

Identify all terminal nodes , p = such that the two condition axe satisfied 

1. Dist(p lT(*,j)) < Di8t(y | T(fy, j)) + KBt(tp |T(®,i)) 

2. P*tettt(y I r(fy,i)) 0 and Parent(fy | r(e,i)) # 0 

For any other terminal node a fy of T(»,y) snch that Paxent(p j T(a,y) ^ 0 
Step 3 [Select terminal node] 

For each pair e,v = 1,2, ...n 

find the node f element fy | p = l,2,...n such that 









CHAPTER 3. COMPUTATION ON AN DAG 


26 


DiBt(r I r(*, i)) -f Digt(y 1 r(r, i)) = Max | T{x,j)) + Dist(y \ T{tfJ))} 

St«p 4 [Set paieai node] 

For each pair *,y = l,2,...n 
Set I/Ocate(y | T(x,j)) = f* 

Set Pareiit(y ] T(z,j + l)) = PBient(y | T(Locate(y | T(xJ)),j)). 

Longest path algorithm 

1. ib = 1 

2. Merge level trees to produce the ib 4- 1*^ level 

3. Jb = ib + 2 

4. Bepeat steps 2 and 3 until K = pogn] 

The first step of the tree merge algorithm tdcu O^og n) to compute with v? processors. 
Implementation of this algorithms is simple. We need to campnte the distance of each node 
from its root node naing the the pointer doubling technique. Since the there are n trees 
and each tree contains at most n nodes the number of processors can not exceed n®. The 
second step oi the algorithm can be executed in 0(2) using processors. In the third step 
of the algorithm • the maximum distance is computed for each [node, toot] pair. It takes 

processors to execute this step. The last step of the algonthm can be executed using 
proceseors in constant time- Hence , the computation of longest path takes 0(log^ n) time 
with processMS. 
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3.4 New algorithms 

3.4.1 Algorithin 1 

The stifled tree meige metkod descnbed kere attempts to reduce tke numbez ol processors 
employed for tke computatiou ol longest and skoitest j>atk on a directed acyclic grapk. Tkis 
algoritkm employs 0(nc) processors. Tkis algoritkm assumes PRAM (SM) CREW model 
for tke computation. 

Input ; 

A tree T(x,j) , for each node x ia S , specified by Paient(y | V — l,2,...n. 

Output : 

A tree T(x,j + 1) , for each node xmQ , verified by Parent(y | T(*, j + 1)) p = 

Step 1 [Prune Tree] 

Ibr each, pab x,y = l„...n If LeTisl(p | T(x,y)) > 2* tke Paient(p j r(*, j)) = 0 
step 2 [Compute Distance] 

For each paii x,y = 1,2, ...n Set distance of node y fiom root node in tke tree of x 
Step 3 [Compute Level] 

For eack pair x,y — 1,2, ...n Set levd of node y from root node in tke tree of x 
Step 4 [Select terminals] 

For eack pair = l,2,...n sort tke set of pairs {x,y) wkere y is terminaL sdect one 
pair for each y in tke list, and set Select(x | r(®, j)) — TRUE; else Select(s | T(x,j)) 
= FALSE; 

step 5 [Identih terminal nodes] 

For eack pair = 1,2,...« 

Identify aU terminal nodes t, , p =l,2,..r such tkat tke tkiee conditions are satisfied 
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1. Di8t(» I T(*,i)) < Di8t(y I Tii,.j)) + Di8t(<p | r(*,i)) 

2. P&rent(y | r(iy,;)) ?£ 0 wid P»reiit(t, j T{z,j)) ^ 0 

3. Select(y I r(*, j)) i* TRUE 

For any other terminal node « 5 ^ t, of T(*,;) such that Patent(y | T(sJ)) ^ 0 
Step 6 [Select terminal node] 

For each pair x,y = 1,2, ...n 

Find the node T element of | p = 2 , 2 , ...n} anch that 

Difit(t* I r(x, j)) + DiBt(y j T{t-J)) = Max {Di 8 t(i^ | r(*,i)) + Diat(y | T(i^J))} 
Step 7 [Set parent node] 

For each pair x, y = 1,2, ...n 

Set Locate(y | r(*, j)) = f* 

Set Parent(y j T(x,j + 1 )) s= Patent(y | r(Locate(y [ T(xJ)),j)). 

Any node which i» away from the root of ite tree by a distance greater than V is 
removed from the trees. This done to ensure that the terminal nodes which are used for 
the construction of trees of height 2 J'‘'^ axe exactly V edges away from the root node with 
respect to the level trees. The distance of level of the nodes axe calculated on the 
pruned tree. Distance and level calculation can be implemented with the pointer doubling 
technique for the trees. The next step of of the algorithm is to identified a fist terminal 
nodes where the trees will be grown. This step of the algorithm Bst all possible terminal 
node is the jEozm of [edge numl>ez,tree] pair. These pairs ate sorted and e terminal nodes 
axe identified to construct the next level trees. After the tenmual nodes ate identified for 
merging , the b * e distances are written in a table of die n by e. j + 1 *^ level ma xim u m 
distance is is computed for each [tree,node] pair using the n * e sued matrix. For each node 
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, the p&teat is ch&nged if the iDuumam of the j + 1^^ level is gieatei than the the maxununi 
of the level. 

The first three steps of the slgoiithm are compated with the pointer doubling technique. 
This requires processor to compute in 0(log n) time. The fburth step of the algoritluD 
is to pick out one e terminal nodes out the possible terminal nodes. We can the jnck out 
the e terminal nodes by sorting the pairs and marking one node, for each edge number 
which is present in the list of terminal nodes. The sorting operation can be done with (n^) 
processors and 0(log n) time. The terminal node sdlection can be done in constant time 
with the same number of processors, longest path for each node can be selected in 0(log ») 
time with nc processors. The longest path tree can be compnted by repeated application 
of stifled tree merge method. This algorithm computes the longest path for the nodes the 
graph with respect to the root node of the directed acyclic graph. This algorithm gives the 
longest path tree only with respect to the root node of the directed acydic graph. 

3.4.2 Aigorithm n 

Reversed tree method 
Input : 

A tree T(xJ) , for each node * in <1 » specified by Parent(p |T(*,y)) » = «• 

Output : 

A tree T(tJ -i- 1) , for each node ar in fif , specified by Paient(y|T(x j+1)) p = l,2,...n. 
step 1 [Construct intermediate gri^h , JG(0)] 

For each triplet = 1,2, ...n if edge y -* z is present in T(*,i) then enter edge 

p ar in JG(0), 

step 2 [Duplicate the intermediate graph] 
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For e&ch triplet x,y,z — 1,2, ...n if edge y —¥ z ib present is /G(0) then enter the edge 
y * in /G(«) 

Step 3 [connect leaf nodes to roots] 

For each pair x,y = 1,2, ...n 

if LeYel(y | T(x,j)) = 2^ then enter t — » y in /G(r) along with DiBt(y j T(x,jf)) 

Step 4 [Compress the linear part] 

Urang the pcdntei doubling , lemoTe an; linear parts in the IG. 
step 5 [Node removal] 

Calculate the distance of the terminal nodes and remove them 
Step 6 poop] 

Bepeat Step 5 and 6 for at most (log n] times. 

Step 7 [calculate for every node] 

Repeat for pog n] times. 

reconstruct the linear parts and calculate the distance for the node inside the linear 
parts 

Step 8 [building T(x,j + 1)] 

fox each paix x,y = 1,2,. ..n 

if y is edect in /G(*) then y is ccmnect to T(*, j + 1) 

The intermediate graph is constructed to reduce the number of processors requited for 
the algorithm. Intermediate graph is constructed by taking the edges from aD the r(j6,y)s. 
One intermediate graph is constructed for each tree T{x,j). Distance of terminal nodes 
from the root nodes is calculated in each tree T(*,i) and ed^ are added between the 
terminal nodes and the toot in the texmediate graph of the tree T(x,j). Distances of the 
terminal nodes are also written along wich the edges. The maximum distance of the nodes is 
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calculated on the intermediate graphs construct. Although the method described here has 
been proved only for stinctared graphs , we strongly feel that this can be extended to the 
general problem as weD. The marimum distances on the intermediate graphs are calculate 
in the steps 4 to 6. This calcnlation is done in two phases. In the forward phase , the 
maximnm distance is calculate only for some nodes. The the reverse phase of the algorithm 
, the maximnm distance is calculated for the nodes which atte ehminated daring the first 
phase. The second phase of the calcnlation is essentially replay of the forward phase in the 
reverse order. When the second phase is executed the distance is known for the some of the 
nodes and the distance of the remaining nodes are calculated based on these values. 

Intermediate graph construction can be carried out with O(n^) processors in constant 
time. Each tree contains at most n edges and there are only n tree. Hence , the total 
number of edges which are to considered for the intermediate graph construction is only nP. 
MaMng one copy for each tree r(*,i) take n« processors. The intermediate graph contains 
at most e edges and we want to make n copies of the intermediate graph. This can be done 
hy simple data propagation routine which usee n processor for each edge in the intermediate 
graph. This implies that the total number of nodes require for this stage of the algorithm 
is ne. The distance calculation is carried out in aD the mtennediate graphs in parallel- The 
distance calcnlation technique requites at most e processor for each intermediate graph- 
Hence , the distances can he calculated mdng ne processors in 0(log n) time. 
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4.1 Introduction 

A line gtaph represent tb.e interrelation between the edges of a graph , root graph. The line 
graphs are nseM to stndy the properties of the original graphs. A lot of work has been done 
in the uea of eulezian and hamiltonian paths on line and root graphs. The points of a line 
graph , L{G) , represent the edges of the root graph and if any two edges of the root graph 
G are adjacent then corresponding nodes of the line graph ate adjacent. The line graphs 
can be constxncted for directed graphs too. The line graph of a dlpaph D is the digraph 
X(J!>) having a vertex l{a) fat eadi arc a of JD and an arc (l(ai^a 3 )) ibi each pair ai arcs 
ai , aa of D which ajce of the lozm ai = («,v) and aa = (v,uf). An input graph J? is a line 
graph if it is isomorphic to the Ene graph L(G) of G where Cr is a toot graph. The graph 
O is called the root graph of H where fT is a line graph . The line graph detecthm problem 
has been studied by many people. Lehot[LE 74]ha8 proposed a sequential algonthms to 
detect line and to oonstmct their toot graphs. Except in a trivial case fike triangle 

and star with three branches , root graph of a line graph is unique np to isomoipham. One 
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ol>vious way o detect line gr&plts is to look for forbidden subgiapks 4.1 inrade the input 
graph. Beineke hw proved that a graph hT is a fine graph if only if it does not contain 
any of the nine forbidden subgraphs. A sequential algorithm can be designed based on this 
chararTterisation of the fine graphs. However , the performance of this algorithm will not be 
good. 


4.2 Algorithms 

The algorithms for this problem has two parts. The first part of the algorithm constructs 
the root graph from the input graph assuming that the input graph is a line graph. The 
second part of the program generates fine graph from the constmcted root graph and checks 
against the input graph. If it differs fiom the input graph then it indicates that the input 
graph has at least one of the forbidden subgraph as its induced subgraph. In other words 
, input graph is not a line graph. If the generated line graph is isomorphic to the input 
graph then the input graph is a line graph and the generated graph is its toot graph. 

4.2.1 Labeling algorithm 

The seqnentisl algorithm is based on labding technique. The labeling algc«ithm is used for 
detection of line graphs and construction of their root graphs. Eadi node of the grsph is 
Labeled with a pair of numbers. If both the numbers of the node ate known , then the node 
is fally labeled and if only one number is known the node is called hall-named. The 

nnmbeibg of the nodes of input gives a way to map the input graph to the edges of the 
line graph. The line graph can he generated directly without nnmbeimg the nodes. If this 
method is followed then even if the input graph is a line graph it will he difilcnlt to get a 
mapping from the nodes of the input graph to edges of the fine graph. 
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Step 1 

Pick up two a4)4<:ent nodes, K uae tiiem 1-2 «.nd 2-3. They *re the two basic nodes to 
start with. 

Step 2 

Find aD nodes adjacent to both the basic nodes. 

Case 1 (If there is only one such node] 

Call it X. If there is a node adjacent to one node of the triangle without being adjacent 
to any other node of the tilangje , then * = 2-4. Otherwise , * = 1-3 
Case n [H there ate two adjacent nodes] 

Call the two adjacent nodes x and y. 

1. U s and y ate adjacent , there is no cross node and go to step 6. 

2. If x and y ate not adjacent , they constitute two triangles with the baedc nodes, and 
we find out if there is u. odd triangle. H there is one , the corresponding samnut , 
say z , is 2-4 , and then y will be named 1-3 and wili be named 1-3 and will be the 
cross node. 

Case in [H there is group of adjacent nodes] 

If there is a node a of the group which is not adjacent to a certain node b then either 
a .is the first node of the group under investigation or it is not . In the first case , the tie 
is broken by the adjacency of a to a third node of the group, if it is adjacent , 

then i is declared the cross node. If it is not , then a is declared the cross node. 

Step 3 

All the node of the diqne are named and cross nodes , if any , is &lly named. 


Step 4 
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The nodes of the cBque ate used to half-name successively the ”not-yet-fiiIly-named” 
nodes which are adjacent. Then the two associated cliques are disposed of , and the 
process takes place as usual. 

Step 5 

/ 

If there is a half-named node adjacent to fnD named node 

Set the half-name node and the folly-named node to basic nodes and go to Step 2. 

Step 6 

Mark the edges of L(G) , the line graph of G , which ate in j? , until one edge of .ff is 
not in L(G). Ji this happens: Exit-.H' is not a line graph. If this does not happen: Exit - S 
is a line graph. 

Time complexity of the algorithm is easy to establish. Edges of the line graph are 
scanned at most twice. A nodes is hall-named when it is scanned for the first time and the 
same node is fully-named when it is scanned for the second time. Constant amount of work 
is done for half-naming and fully-naming the nodes. Hence , worst case time complexity of 
the algorithm is 0(e). 

4.2.2 Parallel algorithm 

Eadi node of root graph is represented by a clique in the line paph. The algorithm recreate 
tiie root paph from the line graph by mapping the cliques in the paph to the nodes of 
the root paph. The algorithm constructs cliques in two phases. At most 2n cliques are 
constructed to detect the root paph. If the input paph is a line paph then the constructed 
intersection paph is nothing but the root graph of the given line graph. Otherwise ^ the 
intersection paph is not a root paph. The cross nodes are possible only in the first set 
of cfiques. The cross nodes are detected by checking for the odd and even triangles is the 



CHAPTER 4. LINE GRAPHS 


39 


nodes of tie cHques. Triangle cheddng can be eliminated by constzncting tie cliques tirice 
and tie cross nodes can be detected by counting tie occurrence of tie nodes in tie cliques. 
Tie algoritim employs O(n^) processors to detect tie line giapi and to construct tie root 
graph of tie input line graph. The detection algorithm works with 0(log n) time. 

Step 1 

For each node in the graph select a neighboring input node. Le r pairs of nodes are 
selected. 

Step 2 

For each pair (c, y) hnd aS z such that both t and y are adjacent to z. each group 
forms a cBqne in the input graph and corresponds to a node in the line graph. 

Step 3 

For each clique formed 

1. Find the cross node , if any , and remove it &om the clique. 

2. Order the nodes in the clique in the ascending order and sdect the smallest three 
numbers in each group. Number tie cliques with their selected nodes 

Step 4 

Sort the formed diqnes by theii number and remove duplicate entries. 

Step 5 

1 . Collect the remaining neighbors of the first node of each cBque and form new dhqnes 

2. Nodes of each new clique is sorted in the ascending order and the first three numbers 
need to number the cliques. 


steps 
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Tke iatersection grapk of tke cliques wMch tke toot grapk is conattucied. 
Step? 

The fine giaph ia geneiated fiom the toot graph and checked foi isomoiphisin. 



Chapter 6 


Concluding remarks 

In this thesis three types of pioblems have been consideied. First one is the problem 
of evalnating a set of expressions. It was shown that the set can be evaluated with less 
■oompntational cost if there exist some common subexpression. The evaluation method 
assigns one processor for each unique subexpression in the set. This reduces the processors 
xeqnixement for the computation. One possible extension to this method is applying node 
cdoring algorithms to identify the a set of non-overlapping nodes. 

The tree contraction method evalnates the expression by adjnstmg the linear expresscms 
srtored at various nodes after every take operation. This ensures that the tree , through out 
the computation process , represents the same value. Evaluation of the linear expression 
involves the shnpMcation of expressions which make use of the algebraic properties of the 
opexatoiB. One can attempt to devise a general algorithm which can work with any combi- 
nation of operators. This can be adiieved S' one postpones linear expression computation 
till it is actually required. This can be achieved by stonng an expression tree at each of the 
nodes in the network and computation must be done in the second phase of the algorithm 
where the subexpressions are <x>mpated. 


41 



CHAPTER 5. CONCLVDING REMARKS 


42 


Tke secoad type of problem which has beea discussed in chapter 3 is the problem of 
computing the ehortest and the longest paths on the directed acyclic network using tree 
merge algorithm and matrix multiplication. One needs 0(n3) processors for the algorithms 
which axe using matrix multiplication or the normal tree merge algorithm. We have pro- 
posed two algorithm which employs less number of processors. First one proposed requires 
lass number of processors. However , we could not prove pblylog time bound for this algo- 
rithm. The emintical performance of this algorithm in term of the number parallel steps 
is closer to performance of the O(n^) algorithms. The second algorithm was designed for 
a class of directed acyclic graphs. This computes the shortest and longest paths , from a 
single source , in 0(log^ n) time nsing 0(ne) processors. 

The third problem which is discussed in chapter 4 is the problem of detecting the line 
graphs and constructing the root graph if the input graph is line graph. An algorithm 
has been proposed which requites 0{R^)pr(Hx$sorstocomputeinO(log n) time on an CREW 
machine. The algorithm described for the the undirected graphs can be easily modified for 


the directed gxa^u. 
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